Semantic Similarity Measure Using Information Content Approach With Depth For Similarity Calculation

نویسنده

  • Atul Gupta
چکیده

Similarity is criteria of measuring nearness or proximity between two concepts. Several algorithmic approaches for computing similarity have been proposed. Among the existing Similarity measure, majority of them utilize WordNet as an underlying ontology for calculating semantic similarity. WordNet is a lexical database for English Language which was created and maintained by Congnitive Science Laboratory at Princeton University under the supervision of Professor George A. Miller. It is organized as a network which consists of concepts or terms called Synsets (list of synonyms terms) and the relationship between them. There are different type of relationship exists in WordNet such as is-a, part-of, synonym and antonym. It has thdatabases, one for noun, one for verb and one for adverb and adjective. This project work proposes a metric for semantic relatedness calculation between pair of concepts which uses Tversky’s feature based approach which takes into account the common and distinct feature of the two terms or concepts. If commonality is more as compared to differences the similarity between concepts is high otherwise similarity is low. Tversky’s theory is quantified by information content of two concepts and the Information content of most specific common ancestor of two concepts. As we move down in the WordNet hierarchy, more specific and more Informative concept are there, where as when we move up in the hierarchy more Generalized and less Informative concepts are there. So depth of a concept in the WordNet hierarchy is a critical factor in similarity calculation. We take into consideration the depth of the specific concept in the WordNet hierarchy which is the deciding factor for determining the relevance of distinct feature specific to a concept in similarity calculation. Introduction of depth reduces the impact of the less relevant dissimilarity indulge in similarity calculation thereby increase precision. We carried out our experiment of 28 word–pair common to Rubenstein-Goodenough and Millers-Charles set. These word-pair range from low similarity, intermediate similarity and finally to high similarity pairs. Evaluation is done by calculating our similarity values calculated using the proposed measure with the human rating. We utilize Pure Java Wordnet Similarity Library for implementing our proposed metric. Experimental results shows that the proposed metrics is at par with the existing similarity measure and superior to some of the traditional ones. ————————————————————

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Information Theoretic Framework for Finding Semantic Similarity in WordNet

Information content (IC) based measures for finding semantic similarity is gaining preferences day by day. Semantics of concepts can be highly characterized by information theory. The conventional way for calculating IC is based on the probability of appearance of concepts in corpora. Due to data sparseness and corpora dependency issues of those conventional approaches, a new corpora independen...

متن کامل

Developing a Semantic Similarity Judgment Test for Persian Action Verbs and Non-action Nouns in Patients With Brain Injury and Determining its Content Validity

Objective: Brain trauma evidences suggest that the two grammatical categories of noun and verb are processed in different regions of the brain due to differences in the complexity of grammatical and semantic information processing. Studies have shown that the verbs belonging to different semantic categories lead to neural activity in different areas of the brain, and action verb processing is r...

متن کامل

A Survey on Semantic Similarity Measure

Measuring semantic similarity between concepts is an important problem in web mining and text mining which needs semantic content matching. Semantic similarity has attracted great concern for a long time in artificial intelligence, psychology and cognitive science. Many measures have been proposed. The paper contains a review of the state of art measures including path based measures, informati...

متن کامل

A procedure for Web Service Selection Using WS-Policy Semantic Matching

In general, Policy-based approaches play an important role in the management of web services, for instance, in the choice of semantic web service and quality of services (QoS) in particular. The present research work illustrates a procedure for the web service selection among functionality similar web services based on WS-Policy semantic matching. In this study, the procedure of WS-Policy publi...

متن کامل

Automatic Hashtag Recommendation in Social Networking and Microblogging Platforms Using a Knowledge-Intensive Content-based Approach

In social networking/microblogging environments, #tag is often used for categorizing messages and marking their key points. Also, since some social networks such as twitter apply restrictions on the number of characters in messages, #tags can serve as a useful tool for helping users express their messages. In this paper, a new knowledge-intensive content-based #tag recommendation system is intr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014